1 Abstract

A (very) short summary of the report. As an example, you can simply write one sentence for each section in the report.

2 Introduction

In this section, state the questions of interest, motivation of this analysis, and potential impact of your results. You can simply rephrase the Project Description for minimal efforts. You can also cite published papers or credible articles on the Internet. For instance, you may find this brief very relevant. More can be found by searching the key words “class size”, “education”, “performance.” See, among others,here for proper citation formats.

3 Background

Project STAR, an acronym for “Student-Teacher Achievement Ratio, was the first of the three-phase Tennessee class size project. The introductory phase, which began in 1985 and extended to 1989, tested the effect of three class sizes among children from first grade to fourth grade. STAR’s inspiration stemmed from a similar study done in a few states away in 1981. Indiana’s project”Prime Time” was a two year study of the grades Kindergarten through 3rd, finding that students in smaller class sizes had higher achievement. Despite the results from the nearby state, STAR’s mission was to prove that smaller class sizes would benefit in their own states. This way, lawmakers’ question of funding such an expansion of the education system being worthwhile would be answered.

The study took place over four years. Schools, who had a mandatory sign up for the period, had to have no less than 57 students for each grade. This number was required so that the grades could be required into class type 1, made up of 13 - 17 students, and type 2 and 3, which both had classes of 22 - 25 students. The difference between the two being the latter has a teacher aide while the former does not. The teacher aid did not have any instruction outside of supporting the teacher in whichever way was desired.

The dataset for this analysis is provided by the Harvard Dataverse. For the purpose of this analysis, the data will be reduced to the ID of each school, teacher, the class type, and the score of the mathematics examination for first graders.

4 Descriptive analysis

Select the variables you find relevant based on your understanding in the Background section. Summarize univariate descriptive statistics for the selected variables (mean, standard deviations, missing values, quantiles, etc.). You can create the table using functions in base R, or use packages (see, e.g., this note).

From the data set, we can easily notice that various number of students are assigned to each teacher. In order to obtain one summary measure with teacher as the unit, we need to aggregate students’ performance (their math scores in 1st grade).

  • Choose the summary measure to be used. Your options are the mean, median, quantiles, etc. Be sure to justify your choice.
  • Calculate this summary measure for each teacher. You may find the summarise() function helpful (link).

Depending on whether you use the data from AER or from the Harvard dataverse. You may run into different issues in preprocessing.

  • AER: There are no variables that present teacher ID or class ID. However, it is possible to uniquely identify teachers/classes based on variables experience1, tethnicity1, schoolid1, and star1.

  • Harvard dataverse: You need to read the description to find and download the data set. The data set is in the .sav format with 379 variables. However, you can easily identify teacher/class in 1st grade using the variable g1tchid. A copy of the data set is available here in case the server crashes.

Multivariate descriptive statistics for the outcome (the chosen summary measure for each teacher) with key variables (e.g., class types, school IDs).

  • Outcome v.s. class types: You can draw boxplots using ggplot2 (link).
  • Outcome v.s. school IDs: you may want to report selected summary statistics, since there are many schools and a handful of teachers/classes per school.

On initial loading of the data, we find that there is 5,003 rows with missing data, many of those rows missing all the information needed for our analysis. After omitting these rows, we are left with 6,598 students with 339 teachers spanning from 76 schools.

## Failed to find G1READ_A
## Failed to find G1MATH_A
## Failed to find G2READ_A
## Failed to find G2MATH_A
## Failed to find G3READ_A
## Failed to find G3MATH_A

For this analysis, we will perform ANOVA on a scaled down version of our original dataset. We reduce our original dataset to just grade 1 teachers, schools, class type, and scores. Along with this column reduction, we also remove all rows that contain any missing values post variable selection. When we perform said transformation on the data, we now only consider the 4 mentioned predictors with 6,598 observations of classes. Below we will visualize the data to gather insight for our analysis.

We first begin with visualizing the math scores by class type. We can see that the median score for class type 1, those with 13-17 students, is the highest, followed by class type 3, 22-25 students with a teaching aide. The first and third quantiles are also higher in value for class type 1 than the other two class types. Again, class type 2 is the lowest.

# Visualization 
#ggplot(star, aes(x = g1tmathss)) + geom_boxplot() + labs(x = "Score", title = "Boxplot of Math Scores")

ggplot(star, aes(x = g1tmathss, fill = g1classtype)) + geom_boxplot() + labs(x = "Score", title = "Score vs. Class Type")

ggplotly(
  ggplot(star, aes(x = g1tmathss, fill = g1classtype)) + geom_histogram(alpha = 0.8, bins = 25) + labs(title = "Histogram of Scores Grouped by Class Type")
) #, position = "identity")
## Warning: The following aesthetics were dropped during statistical transformation:
## x_plotlyDomain
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
# We gather the mean of each combination of school id and teacher id
star.sum <-
  star %>% group_by(g1classtype, g1schid, g1tchid) %>% summarise(
    mean = mean(g1tmathss),
    std.dev = sd(g1tmathss),
    median = median(g1tmathss)
  )
## `summarise()` has grouped output by 'g1classtype', 'g1schid'. You can override
## using the `.groups` argument.
kable(head(star.sum, 10))
g1classtype g1schid g1tchid mean std.dev median
1 112038 11203805 499.8235 29.62101 505.0
1 123056 12305606 534.4706 22.13063 532.0
1 128076 12807604 555.2143 33.67076 559.5
1 128076 12807606 544.3333 37.00322 549.0
1 128079 12807905 522.4667 34.98544 526.0
1 128079 12807907 515.1250 33.46217 512.0
1 130085 13008508 556.7647 63.58609 545.0
1 159171 15917110 544.2667 40.97642 535.0
1 159171 15917111 534.6250 42.14558 530.5
1 161176 16117607 533.6471 31.97448 535.0
# Visualize the distribution of means of scores 
ggplotly(ggplot(star.sum, aes(x = mean)) + geom_histogram(bins = 19) + labs(x = "Mean", title = "Distribution of Means Grouped by Teacher and School ID"))
## Warning: The following aesthetics were dropped during statistical transformation:
## x_plotlyDomain
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
# Visualize the distribution of medians of scores
ggplotly(ggplot(star.sum, aes(x = median)) + geom_histogram(bins = 19) + labs(x = "Median", title = "Distribution of Medianss Grouped by Teacher and School ID"))
## Warning: The following aesthetics were dropped during statistical transformation:
## x_plotlyDomain
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
# Create a summary for schools and plot
ggplotly(ggplot(star.sum, aes(x = as.numeric(g1schid), y = mean)) + geom_point() + labs(x = "School ID", y = "Mean"))
# We group by means of teachers thus making samples for schools
star.sum.sch <-
  star %>% group_by(g1tchid) %>% summarise(
    mean = mean(g1tmathss),
    std.dev = sd(g1tmathss),
    median = median(g1tmathss), 
    g1schid = unique(g1schid),
    g1classtype = unique(g1classtype)
  )

# Schools 244728, 244736, 244796,244839
table(star.sum.sch$g1schid)
## 
## 112038 123056 128076 128079 130085 159171 161176 161183 162184 164198 165199 
##      3      3      4      4      4      6      4      6      4      3      3 
## 166203 168211 168214 169219 169229 169231 169280 170295 173312 176329 180344 
##      3      5      3      5     12      5      4      4      4      4      6 
## 189378 189382 189396 191411 193422 193423 201449 203452 203457 205488 205490 
##      4      4      4      3      3      5      7      5      3      4      3 
## 205491 205492 208501 208503 209510 212522 215533 216537 218562 221571 221574 
##      3      3      4      3      6      5      6      6      4      6      4 
## 225585 228606 230612 231616 234628 244697 244708 244723 244727 244728 244736 
##      4      4      3      3      6      6      6      6      5      3      3 
## 244745 244746 244755 244764 244774 244776 244780 244796 244799 244801 244806 
##      4      3      7      3      6      6      3      3      4      4      7 
## 244831 244839 252885 253888 257899 257905 259915 261927 262937 264945 
##      4      5      4      3      5      7      4      5      4      5
# lets view the scores of these schools and see if they stand out
# filter(star.sum.sch, g1schid == "244728" |  g1schid  =="244736" | g1schid  =="244796" | g1schid  == "244839")

# star.sum.sch <-
#   star.sum.sch %>% filter(.,
#                           g1schid != "244728" &
#                             g1schid  != "244736" & g1schid  != "244796" &
#                             g1schid  != "244839")
# length(unique(star.sum.sch$g1schid))
# 
# star.sum.sch %>% filter(.,
#                           g1schid == 244728 |
#                             g1schid  == 244736 | g1schid  == "244796" |
#                             g1schid  == "244839")
star.sch.red <- star.sum.sch[!(
  star.sum.sch$g1schid == 244728 |
    star.sum.sch$g1schid  == 244736 |
    star.sum.sch$g1schid  == 244796 | star.sum.sch$g1schid  == 244839
), ]

#table(star.sch.red$g1classtype)
#unique(star.sch.red$g1schid)
#[star.sch.red$g1schid == "244736",]

5 Inferential analysis

We now perform a two-way anova on class type and school id, to find if there is a significant difference amongst classes in schools

# We find interaction to be non significant
fullmod <-
  anova(aov(mean ~ g1classtype * g1schid, data = star.sum.sch))
fullmod
# Fit no interaction model
redmod <-
  aov(mean ~ g1classtype + g1schid, data = star.sum.sch)
#redmod$residuals

# Model for the median 
fullmod.med <-
  aov(median ~ g1classtype * g1schid, data = star.sum.sch)
anova(fullmod.med)
# Interaction is still not significant
redmod.med <-
  aov(median ~ g1classtype + g1schid, data = star.sum.sch)
anova(redmod.med)
# Outlier removal
plot(redmod.med) # outliers 253, 179, 53, 143, 144

star.sum.sch[c(53,61, 143, 144, 179, 253),]
# Outliers are mostly class type 1, who had a very high score, Obs 143 is class type 2, who had a higher than average score 

# Remove these points
star.red <- star.sum.sch[-c(53,61, 143, 144, 179, 253),]

#
mod.med.red <-
  aov(median ~ g1classtype + g1schid, data = star.red)
anova(redmod.med)
anova(mod.med.red)
leveneTest(median ~ g1classtype * g1schid, data = star.red)
plot(mod.med.red)
## Warning: not plotting observations with leverage one:
##   140

We can define a two-way ANOVA model as follows \(Y_{ijk} = \mu_{..} + \alpha_{i} + \beta_{j} + \epsilon_{ijk}\), where the index \(i\) represents the class type: small (\(i=1\)), regular (\(i=2\)), regular with aide (\(i=3\)), and the index \(j\) represents the school indicator. You need to explain the rest of the parameters, state constrain ts on the parameters, and justify the choice of model (e.g., why no interaction terms).

State the assumptions on your proposed model.

Fit the model on the Project STAR data amd report the fitted results with some attention on how/whether to report the estimated coefficients for school IDs.

State the hypotheses to test. Please be sure specify the significance level and interpret your test result. Explain any additional assumptions involved in this test.

For the secondary question of interest, one option is the Tukey’s range test ( link). You can employ other testing procedure as well. Again, specify the significance level, interpret your test result, and explain any additional assumptions involved in this test.

6 Sensitivity analysis

6.1 Normality

  • Examine the residual plot of the fitted model. You need to explain your findings in these plots (e.g., whether assumptions are plausible).

Sensitivity analysis for the ANOVA model requires two major assumptions, normality of residuals and homoscedasiticy. We check normality with a normal QQ-plot as well as a formal test, in this case, Shapiro Wilkes. For the test, we have the following hypothesis: \[ H_0: \text{The data comes from a normal population} \\ H_a: \text{The data does not come from a normal population} \] Below we visualize the Normal QQ-plot for our model mean model, where we can see that there is some slight weight in the tails, caused mainly by a few outliers. These tails are drastic enough to claim a significant departure from normality. A Shapiro Wilk test supports our evidence, with a p-value below our significance level 0.05, leading us to conclude the residuals of our model are not normally distributed. This departure from normality is not aggressive enough to jump to a non-parametric technique, as ANOVA as been proven to be a robust test against slight departures in normality. We will compare these results with our ANOVA test using the median scores.

  • For alternative methods, you can explore
    • other summary measures (say, median instead of mean)
    • or nonparametric approach and check if your answers to the questions of interest change.
# Examine residuals
# ggplot() + geom_qq(aes(sample = redmod)) + geom_abline(aes(slope = 1, intercept = 0))
# 
# ggplot(st) + geom_qq(aes(sample = redmod)) + geom_abline(aes(slope = 1, intercept = 0))

# We can see some deviation from normality, given that we have heavy tails
plot(redmod, which= 2)

# Lets check with a formal test
shapiro.test(redmod$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  redmod$residuals
## W = 0.98016, p-value = 0.0001262
# We reject, concluding that the population does not come from a normal distribution

We repeat our process of checking normality first with a QQ-plot, which already shows approximate normality, with some slight heaviness in the tails. These tails are caused mainly by a few outliers, which we will explore further in the last third of this section. We observe that our formal test for normality leads us to failing to reject the null hypothesis, concluding that this indeed from a normal population. Our analysis using the median, for now, is appearing to be the better choice for the statistical measure of average. We now move on to checking our second condition.

# Let's explore the distribution of the median 
plot(redmod.med, which = 2) # median has much lighter tails

# Shapiro.test
shapiro.test(redmod.med$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  redmod.med$residuals
## W = 0.99321, p-value = 0.1295
# Normality is justifiable for median

6.2 Homoscedasticity

Checking for equal variance amongst groups will be checked with two methods, a plot of residuals versus fitted values, and a formal Brown-Forsythe test. Due to our weakness in the normality assumption, we will avoid a Bartlett or Levene test, thus leaving us with the more robust Brown Forsythe test. We have the following hypothesis: \[ H_0: \text{The variances are equal} \\ H_a: \text{The variances are not equal} \] ### Model Using Mean

We first observe our plot, which shows a very promising, equally dispersed scatter. The red line in the center depicts that there is no pattern in the residuals, almost tracing the \(x=0\) line. We do however notice some familiar outliers from before, observations, specifically 253 and 172. Our Brown Forsythe test however, tells us we have a violation of homoscedasticity, returning a p-value much less than \(\alpha = 0.05\), leading us to reject our null hypothesis. We move to our next model.

7 Discussion

Conclude your analysis in this section. You can touch on the following topics.

  • A brief recap of this project.
  • Findings in the inferential analysis interpreted in the context of Project STAR.
  • Suggestions for future research and/or policy making given your findings.
  • Caveats of the current analysis.

Acknowledgement

By default, it is assumed that you have discussed this project with your teammates and instructors. List any other people that you have discussed this project with.

Reference

List any references you cited in the report. See here for the APA format, as an example:

Imbens, G., & Rubin, D. (2015). Stratified Randomized Experiments. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 187-218). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.010

Session info

Report information of your R session for reproducibility.

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] plotly_4.10.0  ggplot2_3.4.0  haven_2.5.0    knitr_1.40     dplyr_1.0.10  
##  [6] AER_1.2-10     survival_3.4-0 sandwich_3.0-2 lmtest_0.9-40  zoo_1.8-10    
## [11] car_3.0-13     carData_3.0-5 
## 
## loaded via a namespace (and not attached):
##  [1] lattice_0.20-45   tidyr_1.2.0       assertthat_0.2.1  digest_0.6.29    
##  [5] utf8_1.2.2        R6_2.5.1          evaluate_0.15     highr_0.9        
##  [9] httr_1.4.3        pillar_1.7.0      rlang_1.0.6       lazyeval_0.2.2   
## [13] rstudioapi_0.13   data.table_1.14.2 jquerylib_0.1.4   Matrix_1.5-1     
## [17] rmarkdown_2.14    labeling_0.4.2    splines_4.2.2     readr_2.1.2      
## [21] stringr_1.4.0     htmlwidgets_1.5.4 munsell_0.5.0     compiler_4.2.2   
## [25] xfun_0.31         pkgconfig_2.0.3   htmltools_0.5.2   tidyselect_1.1.2 
## [29] tibble_3.1.7      fansi_1.0.3       viridisLite_0.4.0 crayon_1.5.1     
## [33] tzdb_0.3.0        withr_2.5.0       grid_4.2.2        jsonlite_1.8.0   
## [37] gtable_0.3.0      lifecycle_1.0.3   DBI_1.1.2         magrittr_2.0.3   
## [41] scales_1.2.0      cli_3.3.0         stringi_1.7.6     farver_2.1.0     
## [45] bslib_0.3.1       ellipsis_0.3.2    generics_0.1.2    vctrs_0.5.0      
## [49] Formula_1.2-4     tools_4.2.2       forcats_1.0.0     glue_1.6.2       
## [53] purrr_0.3.4       crosstalk_1.2.0   hms_1.1.1         abind_1.4-5      
## [57] fastmap_1.1.0     yaml_2.3.5        colorspace_2.0-3  sass_0.4.1